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ABSTRACT 

The intent of the proposed effort is the examination of the impact of the elements 
of parallel architectures on the performance realized in a parallel computation. To this 
end, three major projects are developed: a language for the expression of high-level 
parallelism, a statistical technique for the synthesis of multicomputer interconnection 
networks based upon performance prediction, and a queueing model for the analysis of 

shared memory hierarchies. 


INTRODUCTION 

Parallel computer architectures, both commercial and theoretical, are proliferating 
as the speed advantages of parallel computation are recognized. New architecture 
designs may be classified as fundamentally "traditional" in which an old design is given a 
slight modification or "radical" in which the standard approach is discarded in favor of an 
entirely new design. The fact remains that, because of the lack of a well developed, 
disciplined approach, computer architecture design today is very much trial and error. A 
design is produced and then evaluated to determine how good it is. 

This study considers a variety of parallel architectures and selects two 
architectural elements having profound impact on performance, one from each of two 
diverse architectural classes. Models are developed by which the performance for these 
architectures may be predicted. In one case the performance of interconnection 
networks, described by graphical properties, may be predicted through a statistical 
analysis of the data collected about existing networks. In the other case, the performance 
of a shared memory multiprocessor with a memory hierarchy component is modeled 

analytically. 


Relative to all large-grained parallel computation, a high-level parallel language, 
EASY-FLOW is developed to assist with the expression of parallel tasks m the context of 

traditional programming languages. 
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A HIGH-LEVEL PARALLEL LANGUAGE 

Software for computers offering parallel computation must provide the level of 
parallelism specific for the target architecture, designating a point on the spectrum from a 
high-level multiple task model to low-level bit operations. The effort in this project is 
directed toward the former parallel model which is specific to the message passing, 
multicomputer architecture. A high-level parallel language is developed based upon the 
data flow schema of data-dependency directed execution, incorporating the three 
fundamental models of control directed execution: sequencing, branching, and looping. 


Data flow computing is based upon the notion that the execution of a computation 
may be initiated by the availability of data, instead of by a sequence determined from the 
"flow of control." Data values "flow" between computations, triggering executions 
which consume input data and produce output data as results. Results that are produced 
at one computation may be consumed at a subsequent computation, establishing a data 
dependency between the two computations. Computations that are data dependent are 
constrained to execute in sequence. Other computations not so constrained may be 

executed in parallel. 

The objectives of this language design project are to: (1) develop a language that 
requires little retraining of conventional language programmers, (2) provide for the reuse 
of existing software libraries, and (3) expose potential parallelism both implicitly and 
explicitly at varying levels of procedural computation. To this end the EASY-FLOW 
language is developed. 

The basic unit of computation in EASY-FLOW is the atomic unit (atomic since it 
has no substructure) supplied by a subprogram written in a conventional high-level 
language (e.g. FORTRAN, C). The program notation provided by EASY-FLOW gives a 
superstructure located conceptually above the subprograms and relates them by explicitly 
expressed data dependencies. Units, other than atomic, may have a substructure 
consisting of other units related by data dependencies. 

The EASY-FLOW notation provides information which may be used in 
scheduling the execution of units. Units which are not constrained by data dependencies 
may be scheduled to execute in parallel or overlapping in time. The data dependencies 
are made clear by the "single assignment" rule: any name in an EASY-FLOW program 
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is associated with only one value throughout execution. As an exception to this rule, the 
looping construct allows for the convenient update of a name used in iteration, but this 
may be done only in specifically isolated instances which are clearly marked in the 

program. 

While the EASY-FLOW statements allow for the scheduling of units or tasks, the 
atomic units provide for the computation specified in the program. Data values as 
parameters are passed by assigning their values to actual parameter variables to be used 
in a subprogram call. Upon returning from the call, assignments are made from the 
returning parameters to EASY-FLOW variable names, thus shielding the EASY-FLOW 
variables from alteration within the subprogram. 

An EASY-FLOW compiler has been written that produces sequential FORTRAN 
code (in order to determine feasibility) for use with FORTRAN subprograms. The data 
flow graph produced by the compiler is made sequential through application of a 
topological sort A compiler to produce parallel FORTRAN code for a Transputer 
system is currently in progress. 


MULTICOMPUTER NETWORK SYNTHESIS 

Inter-task communication in a multiprocess computation may dominate 
processing time and determine in large part the performance realized. In a multicomputer 
system, the interconnection network linking the processing elements provides the 
pathways for messages passed between tasks residing on separate processors. An 
interconnection network that closely fits the pattern of interprocess communication will 
clearly assist in alleviating the communications overhead. The alternative situation, one 
in which the communications requirements of the application must be mapped to a 
dissimilar interconnection network by mapping multiple edges to single physical 
communication links or mapping single edges to paths passing through multiple 
processing elemnets, may cause delays due to resulting bottlenecks. 
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Previous interconnection network designs have incorporated a regular network 
which matches to a degree the pattern of intertask communications. The selection of the 
network structure has been an intuitive decision based upon the experience of the 
designer. As an alternative, this project examines the use of statistical and optimization 
techniques used in the modeling and synthesis of interconnection networks. This 
approach represents a way to compare elements of diverse interconnection network 
designs in a way that aUows the synthesis of networks by selection of the best elements 
of existing designs and other, perhaps hybrid, networks that may offer better 

performance. 

A multidimensional solution space is constructed by considering the performance 
(the dependent variable) of existing networks along with both quantitative and qualitative 
characteristics (the independent variables) of graphs. Such characteristics may include 
graph size, average degree, diameter, radius, girth, node-connectivity, edge-connectivity, 
minimum dominating set size, and maximum number of prime node and edge cutsets. 
Network performance may be described by the average message delay or the ratio of 
completion rate to network connection cost. By using the method of stepw.se 
Unear regression, a polynomial surface is developed in the solution space. Optimization 
techniques such as response surface methodology or steepest ascent path may then be 
used to optimize the performance variable from the polynomial surface. 

Screening of the relatively large number of independent variables may eliminate 
those that contribute little to the dependent variable value. An optimization technique is 
used to determine local or global points of "optimum" network performance. An 
"opumum" point is an indication of an "ideal” interconnection network, based upon the 
values of the various independent variables. The gradient vector for an optimum point 
which does not have corresponding realistically-valued independent variable values may 
indicate general trends or direction(s) of greatest increase in the value of the dependent 
(performance) variable. 

The optimization process produces a ranking of desirable characteristics and their 
suitable levels. "Optimal" network synthesis will not follow directly from this. The 
information in the ranking will assist the designer in the design process, perhaps 
indicating unconventional directions in the choice of network elements. 
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QUEUEING MODEL FOR SHARED MEMORY HIERARCHIES 

Interference between processors issuing requests to a shared memory may be a 
major factor in limiting performance in a shared memory multiprocessor system. 
Simultaneous requests to a single memory module cannot be serviced simultaneously. 
Only one request may be served, requiring the others to wait under some queueing 
scheme. Memory requests waiting in a queue translate to processors blocked from 
computation and a consequential degradation in achieved performance. 

The queueing model presented is one for a hierarchy of memory modules. A 
hierarchy represents a realistic view of shared memory organization, with relatively 
small, high speed memories at the direct access level and larger, slower response 
memories organized at more remote levels of access. 

An analytical model is developed, based upon a general queueing model. The 
mean waiting time for a request from a processor to be served at a memory module is 
calculated, including the time spent in a queue awaiting service and the time required to 
retrieve the data from the memory module. Queueing delay is based on an estimate o 
queue length and the average service time for a memory module access. From this the 
expected number of busy memories is computed and used as the measure of system 
performance. Analytic results are compared with simulated results for several systems 
differing in the relative numbers of processors and memory modules and the correlation 

found to be high. 
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APPENDIX A-EASY-FLOW GRAMMAR 


Modified 2/1/89 


1 ) 

2 ) 


<program> 

<unit> 


<unit> 
unit <id> : 

<possible declarations> 
input : <list> 

<body of unit> 
output : <list> 
endunit <id> 

Note: input and output are important enough, it was decided to require them even if no explicit I/O is 

called for. 

Semantics: Make note of the unit id. Record declarations in symbol table (associate them with this 

unit). Record input list and output list, associated with this unit. 


3) 

4) 

5) 

6 ) 


7) 

8 ) 


Note: <DOOiean exp> is ucaicu as a i UU p U 6 .ia.. ■ — — '■ — ° .. . , • 

because unit would require another input/output pair and this is already provided by the enclosing unit. 


declare : <declarations list> | nil 
<id> <more list> | nil 
,<id><more list>|nil 
<subprogram> | 

<if unit> | 

<iter unit> | 

<distribute unit> | 

<unit set> 

into : <pairs> subprogram call> outof:<pairs> 
if <boolean exp> 

then cunit set> 
else <unit set> 

<boolean exp> is treated as a subprogram call for now (see grammar). "Umtsets’ used here 

" ' r « .« •_ ■ _ i i iho onr Acino nmf 


<possible declaralions> 
<list> 

<more list> 

<body of unit> 


<subprogram> 
<if unit> 


9) 

<iterunit> ’*= 

iter <boolean exp> 

do <unit set> 



reassign <pairs> 

Semantics: Process boolean expression same as 

above (see <if unit>). 

10) 

<distribute unit> 

distribute <id> = <range> 
<unit set> 

11) 

<unitset> *•= 

<unit> <unit set> | nil 

Note: 

a <unit set> may be nil. 


12) 

<boolean expression> ::= 

<subprogram> 

13) 

<pairs> ::= 

<matchxpairs>| nil 


<const> .. <const> 


Note: <pairs> may be nil. 

14) <range> 

15) <match> 

16) <subprogram call> 

17) coptional parameters > 


<variable id> => <variable id> 

subprogram <id> <optional parameters> 
( <svariable list> ) | nil 


18) 


Note: 

19) 

20 ) 
21 ) 

22 ) 

23) 

24) 

25) 

26) 

27) 

28) 


declarations list> "= real <dvariable Ust> <declarations hst>| 

integer <dvanable listxdeclarations list>| 

boolean <dvariable listxdeclarations list>| 

double precision <dvariable listxdeclarations list>|nil 


The nil above allows declare: <nil>. This is OK to emphasize no declarations! 


<dvariable list> 
<dvariable id> 

<more dvariable list> 


<dvariable idxmore dvariable list> 
<idxoptional dimension list> 
,<dvariable idxmore dvariable list>|nil 


<s variable list> 

<more svariable list> *> 

<variable id> **= 

<optional subscript list> ::= 

subscript list> 

<more subscript list> :> 

<cid> ::=s 


cvariable idxmore svariable list> |nil 
,< variable idxmore svariable list>|nil 
<idxoptional subscript list> 
(<subscript list>)|nil 
<cid> <more subscript list> 
,<cidxmore subscript list>|nil 
<variable id>|<const> 


29) <optional dimension list> : 

30) <dimension list> ' 

31) <more dimension list> : 


(<dimension list>)|nil 
<constxmore dimension list> 
,<constxmore dimension list>|nil 
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Note: Three kinds of variable list are provided for: 

1 Used in declarations and allow only constant dimensions.<dvariable list> 

2. Used in input/output lists in units. For now, no subscripts are allowed.<list> 

3. Other places allow subscripts.<svariable list> (most general). 

Note: <variable id> allows any subscripts (not only constants). 

<id> is any unsubscripted id (or simple id). 
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